Multi-target Extractor and Detector for Unknown-number Speaker Diarization

نویسندگان

چکیده

Strong representations of target speakers can help extract important information about and detect corresponding temporal regions in multi-speaker conversations. In this study, we propose a neural architecture that simultaneously extracts speaker consistent with the diarization objective detects presence each on frame-by-frame basis regardless number conversation. A representation (called z-vector) extractor time-speaker contextualizer, implemented by residual network processing data both dimensions, are integrated into unified framework. Tests CALLHOME corpus show our model outperforms most methods proposed so far. Evaluations more challenging case simultaneous ranging from 2 to 7 achieves 6.4% 30.9% relative error rate reductions over several typical baselines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-stage Speaker Diarization for Conference and Lecture Meetings

The LIMSI RT-07S speaker diarization system for the conference and lecture meetings is presented in this paper. This system builds upon the RT06S diarization system designed for lecture data. The baseline system combines agglomerative clustering based on Bayesian information criterion (BIC) with a second clustering using state-of-the-art speaker identification (SID) techniques. Since the baseli...

متن کامل

Multi-stream speaker diarization systems for the meetings domain

In the context of speech and speaker recognition systems, it is well known that the combination of different feature streams can improve significantly their performance. However, the application of multi-stream (MS) techniques to speaker diarization systems has not been extensively studied. In this paper, we address this issue: we formulate different MS techniques, such as feature combination, ...

متن کامل

Comparing Multi-Stage Approaches for Cross-Show Speaker Diarization

Acoustic speaker diarization is investigated for situations where a collection of shows from the same source needs to be processed. In this case, the same speaker should receive the same label across all shows. We compare different architectures for cross-show speaker diarization: the obvious concatenation of all shows, a hybrid system combining first a local clustering stage followed by a glob...

متن کامل

Unsupervised Methods for Speaker Diarization

Given a stream of unlabeled audio data, speaker diarization is the process of determining “who spoke when.” We propose a novel approach to solving this problem by taking advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features and exploiting the inherent variabilities in the data through the use of unsupervised methods. Upon initial evaluation, o...

متن کامل

Integrating online i-vector extractor with information bottleneck based speaker diarization system

Conventional approaches to speaker diarization use short-term features such as Mel Frequency Cepstral Co-efficients (MFCC). Features such as i-vectors have been used on longer segments (minimum 2.5 seconds of speech). Using i-vectors for speaker diarization has been shown to be beneficial as it models speaker information explicitly. In this paper, the i-vector modelling technique is adapted to ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Signal Processing Letters

سال: 2023

ISSN: ['1558-2361', '1070-9908']

DOI: https://doi.org/10.1109/lsp.2023.3279781